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1 . Introduction. 

The  nearest  neighbor  (NN)  rule  for  classifying  an  observation  Z 
into  one  of  two  given  populations  (or,  classes)  tt^  and  was  first 
introduced  by  Fix  and  Hodges  [3].  The  rule  may  be  described  as  follows. 

Let  (x,  , . . . ,X  ) and  (Y,  , . . . ,Y  ) be  random  (training)  samples  from  tt, 

i n^  1 Og  1 

and  TT^j  respectively.  Using  a distance  function  d rank  the  distances 

of  all  the  observations  from  Z.  Classify  Z into  the  population  to  which 

the  nearest  neighbor  of  Z belongs.  This  rule  was  also  studied  by  Cover 

and  Hart  [2]  based  on  an  identified  training  sample  from  a mixture  of 

TT,  and  n . 
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We  shall  first  suggest  a rule  which  uses  the  above  idea  in  terms 

of  the  ranks  of  the  observations  in  the  pooled  sample  (including  z). 

The  rule  is  specially  useful  when  the  observations  are  indeed  available 

only  in  tenijs  of  their  ranks.  The  rule  described  below  will  be  termed 

as  the  "rank  nearest  neighbor"  (RHN)  rule. 

Pool  the  observations  X.'s,  Y.'s  and  Z and  note  their  ranks.  (i) 

1 J 

If  Z is  either  the  smallest  or  the  largest  observation  classify  Z into 
the  class  of  its  nearest  neighbor,  (ii)  If  both  the  left-hand  and  the 
right-hand  neighbors  (denoted,  respectively,  by  and  V^)  of  Z belong 
to  the  same  class,  classify  Z into  that  class.  (iii)  If  U^  and 
belong  to  different  classes,  classify  Z into  either  of  the  two  classes 
with  probabilities  ? and  j.  (We  shall  call  this  a "tie".) 


In  Section  2 the  asymptotic  (as  0^,02"^)  values  of  the  probabil- 
ities of  misclassif ication  (PMZ)  of  the  RNN  rule  are  derived.  It  turns 
out  that  these  asymptotic  values  are  the  same  as  the  corresponding 
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asymptotic  PIC's  of  the  NN  rule  (see  [3]). 

To  reduce  the  chance  of  randomization  in  the  RNN  rule  we  consider 
a multi-stage  version  as  follows.  If  the  first-stage  RNN  rule  (des- 
cribed above)  leads  to  a tie  we  delete  the  two  observations  corresponding 
to  and  V^,  and  apply  the  first-stage  rule  to  the  remaining  observations. 
We  proceed  this  way  and  move  to  the  next  stage  whenever  a tie  occurs,  and 
apply  the  first-stage  rule  deleting  all  the  observations  that  correspond 
to  the  left-hand  and  the  right-hand  neighbors  in  the  previous  stages.  The 
M-stage  RNN  rule  is  defined  to  be  the  one  which  terminates  at  the  Mth 
stage  (and  allows  for  a tie  in  this  final  stage).  In  Section  3 the 
asymptotic  PNC's  of  the  M-stage  RNN  rule  are  derived. 

The  above  rule  can  also  be  described  in  terms  of  tolerance  regions 
based  on  the  pooled  training  sample.  The  basic  idea  was  suggested  by 
Anderson  [1]. 

We  shall  denote  the  c*d»f's  of  and  by  and  F^,  respectively 

and  we  shall  assume  that  F^  possesses  a density  function  f^  with  respect 

to  the  Lebesgue  measure.  It  is  also  assumed  that  the  density  of  Z is 

either  f,  or  f . 
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2.  Asymptotic  PMC's  of  the  one-stage  RNN  rule. 

The  following  lemma  leads  us  to  assume,  without  loss  of  generality 
at  least  for  asymptotic  results,  that  the  right-hand  and  the  left-hand 
neighbor  of  Z at  the  M-th  stage  (denoted  by  and  respectively)  are 
well-defined.  Let  n = min(n^,n2). 

Lemma  2.1  If  M/n  0 as  n -»  «,  the  probability  (under  either  Z ~ f^  or 
Z ~ f^)  that  there  are  at  least  M observations  to  the  right  of  Z and  at 
least  M observations  to  the  left  of  Z in  the  training  sample  for  all 
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t 

I 


sufficiently  large  n is  one. 


Proof.  Since  are  continuous,  the  probability  that  either 


0 < F^(z)  < 1 or  0 < 'P^iz)  < 1 occurs  is  one.  Suppose,  in  particular, 
0 < Fj(z)  < 1.  It  is  then  sufficient  to  prove  that  the  probability  of 
the  event  stated  in  the  lemma  conditioned  by  Z = z is  one  for  all  z 
such  that  0 < Fj^(z)  < 1.  Define 


“l  - 


where  I is  the  indicator  function.  Then  e(w^)  = 1 - Fj^(z)  > 0.  By  the 
strong  law  of  large  numbers,  we  have 


(2.2)  P[  ^ S(W^)  as  n^  -♦  «]  = 1, 

i=l 


Since  M/n,  "^0, 

X tl- 


(2.3)  P r Zj  W S M for  all  sufficiently  large  n]  = 1, 
i=l  ^ 


The  corresponding  result  for  the  left-hand  neighbor  of  Z can  be  proved 
similarly. 


Next  we  shall  prove  that  U„  and  tend  to  Z almost  sure  as  n -»  «. 

MM 


Lemma  2.2.  Given  that  Z is  distributed  as  F^,  both  and  converge 


to  Z almost  sure  as  n^  “ and  M/n^“^  0. 


Proof.  Let 

= {2:  F^(z+e')  - Fj^(z)  > 0,  Fj^(z)  - 7^{z-e)  > 0 for  all  s > o] 


Then 


P(z€sJ  = 1. 


This  follows  from  the  fact  that  the  set  of  intervals  in  which  Fj^  is 


■J 


- i+  - 


constant  is  at  most  countable.  Thus  the  set  of  endpoints  of  these 

intervals  has  F, -measure  zero,  since  F,  is  continuous.  Thus  for  z^S, 
i j.  1 

(2.4)  F^(z)  - F^(z-e)  > 0 

for  every  e > 0.  We  shall  now  prove  that  given  Z = z€S^,  Z a.s. 

(2.5)  < z - e]  ^ P[W  < M], 

where  W is  the  number  of  X^'s  in  (z-e,z).  Given  e > 0,  n^  can  be 
chosen  sufficiently  large  so  that 

(2.6)  [Fj^(z)  - F^(z-e)]  - M/n^  > Tj  > 0. 

Hence 

(2.7)  P[W  < M]  < exp(-2n^'Il2) 

for  all  sufficiently  large  n^.  Hence  z a.s.  Similarly,  it  can 

be  shown  that  V,,  z a.s.  as  n,  « for  z € S.  The  lemma  now  follows 
M 1 

easily. 

Let  U and  V.  be  the  left-hand  and  the  right  hand  neighbors  of  Z 
i 1 

at  the  ith  stage.  Define 

cp.  = cp.(Z5X.  s,  Y.  s;  j = l,...,n.  j Z = 1,...  >n  ) . 

1 L J ■<'  i 

1,  if  both  U.  and  V,  are  X-observations , or  Z 
1 1 


(2.3)  = 


is  an  extreme  observation  at  the  ith  stage  and 
its  NN  is  an  X-observation, 

if  and  belong  to  different  classes 


,0,  otherwise. 

Let  be  the  event  that  both  and  are  well-defined  at  the  ith  stage. 
The  conditional  probability  of  deciding  that  Z comes  from  using 
the  one-stage  RUN  rule,  given  Z = z,  is 
TT^^\z;n^,n^)  = E[cp^lz=z] 

(2.9)  - I 2=2]  + 


4 
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t 


However 


(2.10)  Etcp^I^c|z=z]  s p(a^  1 z=z)  0 
by  Lemma  2.1  for  almost  all  z.  Now  note  that 


(2.11) 

where 


= i’[(cpj^  = i)  n Aj^|z=z]  + iP[(cpj^  = I)  n A^lz=z], 

= E (U, ,V, ,z)  + t E (U  ,V, ,z), 

Uj^.n^  ^ 1 1 ^ ^ l’  1’ 


(2.12)  (u,v,z)  = = 1[U^  = u,  = V,  A^], 


(2.13)  (u,v,z)  = P[cp^  = = u,  = V,  A^]. 


Now  it  can  be  seen  that 

>(11) 


(2.14)  l*n^^n  = C^(n^,n2)/B(n^,n^ 


), 


(2.15)  = CQ(n^,n2)/B(n^,n2), 


where 


(2.16) 


n,  -2 


^l^”l’''2^  = n^(n^-l)[l-{F^(v)  - Fj^(u)]]  ^ [1'{F2(v)  - F2(u)]]  ^ 

(2.17) 

C (n^.n^)  = n2(n2-l)[l-{F^(v)  - F^(u)]]  ^[1-{F2(v)  - F2(u)}]  ^ 


2 

(2.18) 


n,  -I 


n2-l 


= n^n2[l-{F^(v)  - F^(u)}]  ^ [1-CF2(v)  - F2(u)}] 

{fl(u)f2(v)  + fg(u)f^( 

(2.19)  B(n^,n2)  = C^(n^,n2)  + C2(n^,n2)  + C^(n^,n^). 


f]^(u)f^(v), 
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(2.20)  P.  = lira  n. /(n  + n ) , i = 1,2. 

^ n-*»  ^ i- 

We  assume  that  0 < Pj^  < 1. 

Theorem  2.1.  Suppose  z is  a point  of  continuity  of  both  f^^  and  f^, 
and  fj^(z)  • fgCz)  ^ 0.  Then  for  almost  all  z (under  or  f^) 

TT^^^(z)  = lira  TT^^^(z;nj,n  ) 


(2.21) 

where 


= \ + ITIq, 


(2.22)  Tl^  = p2f2(2)/{p^f^(z)  +p^f^(z)]2, 

(2.23)  TIq  = 2Pj^P2f^(z)f2(z)/{p^fj^(z)  + P2f2(z)}= 


When  u, 

, V -*  z and  n 

p(ll) 

(u,v,z)  -*  T]^, 

'^l’"2 

p(l0) 

(u,v,z)  -»  TIq. 

The  desired  result  now  follows  from  (2.II),  (2.IO),  (2.9)>  Lemma  2.2, 
and  the  dominated  convergence  theorem . 

The  limiting  P^E's  of  the  one-stage  RNN  rule  are  given  as  follows. 
= lira  P(Decide  Z € TT  I Z € rr.) 

n-*o 

(2.26)  — /[l-TT^^^z)  ]f^(z)dz 

= p2f2(z)3]d2 

= lim  P(Decide  Z € TT^|z  € TT  ) 

^ n:H“ 

= / TT^^\z)f^{z)dz. 

(2.27)  = 
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When  the  training  sample  is  an  identified  sample  from  the  mixture  of 
and  rr  with  the  mixture  proportion  and  g , respectively,  we  may 
take  p^  = (i  = 1,2).  Then  the  limiting  value  of  the  total  PIK 
(or,  the  Bayes*  risk)  of  the  one-stage  RNN  rule  is 

(2.28)  r^^^  = J’[2g^g2f3^(z)f2(z)/[g^f^(z)  + g^f2(z)}]d2. 

If  and  were  known,  the  minimum  value  of  the  total  PlC  (or, 

the  risk  of  a Bayes*  rule)  is  given  by 

(2.29)  r*  = / min  [g^f^(2), 

It  can  be  seen  easily  that 


< 2r* 


See  [2].  It  may  be  nv' •'^d  that  the  result  of  Theorem  2.1  holds  a.e, 
(p.)  for  z such  that  1^(2)  + 1^(2)  > 0 instead  of  f^(z)  1^(2)  > 0. 


3.  Limiting  PMC *3  of  the  M-stage  RNN  rule. 

Let  TT^^^(z;n^,n2)  be  the  conditional  probability  that  the  M-stage 
RNN  rule  classifies  Z into  TT^  given  Z = z.  Let 
(3.1)  TT^^\z)  = lim  TT^^^(z;n^,n  ) 

n-*o 

Recall  the  definition  of  cp^  given  in  (2.8).  Then 
TT^^^\z;n^,n2)  = P[cp^  = llz=z] 


(3.2) 


+ S P[cpj_  = i,cp^  = llz=z] 

i=2 

+ k • = ilz=z]. 


p[a^  = = i!z=z] 

= h P[9.  = 1 = 5,Z=z]Plcp^  = ilz=2] 

j=2  ^ 


(3A) 
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(3-3)  = l\z=z] 

= = ?.Z=z] 


P[c5  = tlv  j_  = i.  • • • « ■ I = t.Z=2]. 
j=r2  J" 

P(CP^  = i|Z=z). 


We  shall  show  that  under  certain  conditions 
lira  P[cp^  = Ijcs^  = = i,Z=z] 


(3.5) 


= lira  [9^  = ?1  Z=z]  = TIq. 
rr*o 


lira  P[9^  = l|9j^  = t,  = ?,Z=z] 


(3.6) 


= lira  P[o^  = l|z=2]  = 1]., 

nHDo  ■*• 


where  11^  and  are  given  by  (2.22)  and  (2.23).  Then 
(3.7  TT^^^(z) 

■^i=0 

Suppose  9^  = Delete  the  observations  corresponding  to  and 
from  the  pooled  training  sample.  Denote  the  remaining  X-obser- 

vations  and  Y-observations  by  xf^^(i=l, . . . ,n^-l)  and  Y^^^( j=l, . . . ,n^-l) , 

respectively,  maintaining  the  orders  of  the  original  subscripts. 

Lemma  3.1.  Given  Z=z,  9^^  = ^,  = u^,  = v^^,  the  conditional  distribu- 
tion of  and  Y^^^'s  is  given  as  follows, 

i J 

(i)  x(^^'s  and  Y^^^’s  are  mutually  independent. 

( o*) 

(ii)  The  density  of  x)  is 

(3.8)  42)  (x)  = f^(x)  / [l-{F^(v^)  - F^(u^)}], 
on  the  complement  of  [u^,v^]. 

(2) 

(iii)  The  density  of  Y^  is 

(3.9)  4^^y)  = f2(y)/[i-fF2^^i  ■ 

on  the  complement  of  [u^^jVj^]. 
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Lemina  3*1  can  be  extended  in  similar  lines  inductively  to  the  fol- 
lowing, Suppose  = ? (j  = l,.,.,i-l).  Delete  the  observations  cor- 
responding to  Uj  and  (j  = 1,...,  i-l)  and  denote  the  remaining 
nj^-  i+1  X observations  and  n^-  i+1  Y -observations  by  xj^\t  = 1,.,,, 
n^-i+l)  and  (r  = 1, . . . jn^-i+l) , respectively,  maintaining 

the  order  of  the  original  subscripts. 

Lemma  3 « 2 • Given  Z==z,  ~ j ’ ~ ^ ( j=l > • • • > i-l)  the 

conditional  distribution  of  and  Y^^^’s  is  given  as  follows. 

(i)  and  Y^^^'s  are  mutually  independent. 

(ii)  The  density  of  is 

(3.10)  4^^(x)  = f(^“^^(x)/  [1  - - fJ^'^\u._^)]] 

on  the  complement  of  [u^  I’^i  1^’  c ‘d ‘f  corresponding 

to  f^^  defined  inductively  by  (3.I0)  and  (3.8). 

(iii)  The  density  of  Y^^^  is 

(3.11)  4^\y)  = 4’-"^\y)/[l  - [F2^'^^(Vi_l)  - 

on  the  complement  of  [u^  I’^i  1^’  c»d«f  corresponding 

to  f^^  defined  inductively  by  (3,ll)  and  (3.9). 

The  above  two  lemmas  can  be  proved  following  the  line  of  proof 
of  a similar  theorem  in  one-sample  case  given  by  Anderson  [1],  Their 
straightforward  but  lengthy  proofs  are  omitted. 


Theorem  3.1.  Under  the  assumptions  of  Theorem  2.1  the  limiting  pro- 
bability of  classifying  Z into  rr^  using  the  M-stage  RNN  rule,  given 
Z = z,  is  given  by  (3.7)  > ^cir  almost  all  2 (under  f^  or 
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Proof.  As  in  Section  2 the  conditonal  probabilities  of  = 1 and 
= f,  given  Z = 2,  ^ j ’ ~ ^ ^ • • • >i)  and  9^  = f 

(j  = l,...,i-l),  are  respectively  given  by  and  Cq^Vb^^\ 

where  are  obtained  from  C^,  C^,  C^,  B,  respect- 

ively, (see  (2.16)  - (2.19))  after  replacing  n^,  n^,  u,  v,  f^,  f^ 
by  n^-i+l>  u^,  v^,  ^2^^’  respectively.  Note  that 

if  f.'s  are  continuous  at  z and  u.  and  v.  tend  to  z,  then  f^^^(u,-)"*f  iz) , 
j 1 L J ^ J 

f(^\v^)"»f  j(z)  (j  = 1,2)  as  n ->  «.  Then  the  limiting  values  of 

and  are  respectively  and  Ti^.  Now  (3.5)  and  (3.6)  follow 

from  Lemma  2.2  and  the  dominated  convergence  theorem.  As  in  Theorem 
2.1  we  can  introduce  the  sets  A^  (see  after  (2.8))  and  argue  as  in 
(2.9)  - (2.11).  Now  (3.7)  follows  from  (3.2)  - (3,6). 

The  limiting  P^IC's  of  the  M-stage  RNN  rule  are  given  as  follows, 
-stage  RNN  rule  decides  Z € iTglZ  € TT^] 


4^) 

=5  lim  [M- 

n-»» 

(3.12) 

= ;[i  - ^ 

(m) 

“2 

= lim  [M' 

n-*» 

(3.13) 

Again  in  the  case  of  a training  sample  from  the  mixed  population  we  may 
take  p^  = §^(i  =1,2).  Then  the  limiting  value  of  the  total  PMG  (or, 
the  Bayes'  risk)  for  the  M-stage  RNN  rule  is 


(m)  (m)  , (m) 

r = + §2“2 


M-1 


i=0 


+ iTIo  (§1^1  + §2^2 


I 
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^(M)  _ ^(M-1) 


(3.15) 


Moreover, 


(=°)  _ T . (m) 

r = lira  r' 


(3.16) 


[5f£f(")  + qqun-'-iz 


It  can  be  seen  that 
, . (00 ) 

(3.17)  r*<r^^ 

where  r*  is  the  minimum  Bayes'  risk  as  given  by  (2.29)- 

4 . Estimation  of  P^C's  of  the  one  one-stage  RNN  rule. 

We  shall  estimate  the  PMC's  of  the  one-stage  RNN  rule  by  the 
deleted  counting  method  described  as  follows.  Let 

(3.18)  = 1 - c?i(x^;Xj's...Y_2's;  j ^ i), 

(3.19)  yW  = X.'s,  Y^'s;  i k ) , 

where  9^  is  given  by  (2.3).  Let 

(3.20)  P^(ni,n2)  = n^, 


^2  (k), 

(3.21)  p (n^.n^)  = S Y /n^  , 

^ k =1  ^ 

(3.22)  pCn^.n^)  = t^iPx^"l’'^2^  ^"l  ”2^ 

Then  o and  p can  be  used  as  estimates  of  the  P^E's.  Note  that 
X y 

(3-23)  ^Px^"l’"2^ 


f 
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> 

(3*2^+)  = J’TT^^\2;n^,n2-l)f^(2)dz. 

Order  the  obseir/ations  in  the  training  sample  and  denote  the  number  of 
X-runs  and  the  number  of  Y-runs  by  r^  and  r^.  respectively.  Then  it 
can  be  seen  that 

(3*25)  ”lPx^^l’^2^  " ’^l  ^1’ 

(3.26)  ®2’ 

where  1 (l  = 1»2);  are  the  contributions  arising  from  the 

extreme  observations.  Let  r be  the  total  number  of  runs.  Thus,  using 

(2.26)  and  (2.27)  , we  get 


(3-27) 

lim 

rr*» 

E(r^/n^)  = 

(3.28) 

lim 

n-««> 

and 

lim 

n-*o 

E(r/(n^  + n^))  = 

(3.29) 

= S[2P^P^^-^{z)£^{z)/  tpj^f2^(z)  + P2f2(z)}]dz 

The  result  (3*29)  is  well-known  in  the  theory  of  runs  and  it  was  derived 
by  Wald  and  Wolfowitz  [5].  Now  the  result  (3-29)  ™ay  be  used  to  give 
short  proofs  of  (2.26)  and  (2.27)  after  noting  the  fact  that  | r^-  r^\=  0 
or  1.  Similar  estimates  of  the  PMC's  of  the  multistage  RNN  rules  can  be 
obtained;  however,  they  can't  be  reduced  easily  as  in  (3.25)  and  ('^•26). 


Note  1.  Suppose  the  c»d»f  of  Z is  F.  For  the  one-stage  RNN  rule  the 
conditional  probability  of  classifying  Z into  given  the  training 
sample  is  derived  as  follows.  Let 


(3-30) 


< T 


ni  + 


T,  < T^  < . . 
1 2 
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be  the  ordered  values  of  the  observations  in  the  training  sample.  Write 


(3-31) 


0,  if  T.  is  an  X-observation 
1 


1,  if  T.  is  an  Y-observation 
1 


Then  the  conditional  probability  of  classifying  Z into  using  the 
one-stage  RNN  rule  given  the  training  sample  is 


1 V 


(3.32)  e^F(T^)  + [F(T.^^)  - F(T.)]  (9^^^  + 0.) 

i=l 


+ [1  - f(T  )]9 

The  behavior  of  (3.32)  is  under  investigation. 


Note  2.  It  will  be  quite  useful  to  compare  the  NN  rule  and  the  different 

RNN  rules  when  n is  small  and  under  specific  F,  and  F . Monte  Carlo 

1 2 

studies  on  these  problems  will  be  reported  later. 

Note  3,  The  results  in  Section  2 are  taken  from  the  Ph-D.  thesis  of  the 
second  author  [k]  and  modified  suitably. 
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